4 research outputs found

    Interpretability of API Call Topic Models: An Exploratory Study

    Get PDF
    Topic modeling is an unsupervised method for discovering semantically coherent combinations of words, called topics, in unstructured text. However, the human interpretability of topics discovered from non-natural language corpora, specifically Windows API call logs, is unknown. Our objective is to explore the coherence of topics and their ability to represent the themes of API calls from malware analysts’ perspective. Three Latent Dirichlet Allocation (LDA) models were fit to a collection of dynamic API call logs. Topics, or behavioral themes, were manually evaluated by malware analysts. The results were compared to existing automated quality measures. Participants were able to accurately determine API calls that did not belong in behavioral themes learned by the 20 topic model. Our results agree with topic coherence measures in terms of highest interpretable topics. The results are not compatible with log-perplexity, which concur with the findings of topic evaluation literature on natural language corpora

    Forensicloud: An Architecture for Digital Forensic Analysis in the Cloud

    Get PDF
    The amount of data that must be processed in current digital forensic examinations continues to rise. Both the volume and diversity of data are obstacles to the timely completion of forensic investigations. Additionally, some law enforcement agencies do not have the resources to handle cases of even moderate size. To address these issues we have developed an architecture for a cloud-based distributed processing platform we have named Forensicloud. This architecture is designed to reduce the time taken to process digital evidence by leveraging the power of a high performance computing platform and by adapting existing tools to operate within this environment. Forensicloud’s Software and Infrastructure as a Service service models allow investigators to use remote virtual environments for investigating digital evidence. These environments allow investigators the ability to use licensed and unlicensed tools that they may not have had access to before and allows some of these tools to be run on computing clusters

    Automating Malware Detection in Windows Memory Images using Machine Learning

    Get PDF
    Malicious software, or malware, is often employed as a tool to maintain access to previously compromised systems. It enables the intruders to utilize system resources, harvest legitimate credentials, and maintain a level of stealth throughout the process. During incident response, identifying systems infected with malware is necessary for effective remediation of an attack. When analysts lack sufficient indicators of compromise they are forced to conduct a comprehensive examination to identify anomalous behavior on a system, a time consuming and challenging task. Malware authors use several techniques to conceal malware on a system, with a common method being DLL injection. In this dissertation we present a system for automatically generating Windows 7 x86 memory images infected with malware, identifying the malicious DLLs injected into a process, and extracting the features associated with those DLLs. A set of 3,240 infected memory images was produced and analyzed to identify common characteristics of malicious DLLs in memory. From this analysis a feature set was constructed and two datasets were used to evaluate five classification algorithms. The ZeroR method was used as a baseline for comparison with accuracy and false positive rate (misclassifying malicious DLLs as legitimate) being the two metrics of interest. The results of the experiments showed that learning using the feature set is viable and that the performance of the classifiers can be further improved through the use of feature selection. Each of the classification methods outperformed the ZeroR method with the J48 Decision Tree obtaining the, overall, best results

    CHARACTERISTICS OF MALICIOUS DLLS IN WINDOWS MEMORY

    No full text
    Part 3: FORENSIC TECHNIQUESInternational audienceDynamic link library (DLL) injection is a method of forcing a running process to load a DLL into its address space. Malware authors use DLL injection to hide their code while it executes on a system. Due to the large number and variety of DLLs in modern Windows systems, distinguishing a malicious DLL from a legitimate DLL in an arbitrary process is non-trivial and often requires the use of previously-established indicators of compromise. Additionally, the DLLs loaded in a process naturally fluctuate over time, adding to the difficulty of identifying malicious DLLs. Machine learning has been shown to be a viable approach for classifying malicious software, but it has not as yet been applied to malware in memory images. In order to identify the behavior of malicious DLLs that were injected into processes, 33,160 Windows 7 x86 memory images were generated from a set of malware samples obtained from VirusShare. DLL artifacts were extracted from the memory images and analyzed to identify behavioral patterns of malicious and legitimate DLLs. These patterns highlight features of DLLs that can be applied as heuristics to help identify malicious injected DLLs in Windows 7 memory. They also establish that machine learning is a viable approach for classifying injected DLLs in Windows memory
    corecore